AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Neural Information Processing SystemsFeb-19-2026, 05:47:22 GMT

a9df2255ad642b923d95503b9a7958d8-Supplemental.pdf

All models discussed in the paper use ReLU non-linearity.

artificial intelligence, machine learning, qq null, (18 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsFeb-8-2026, 14:55:01 GMT

Deep Neural Networks as Point Estimates for Deep Gaussian Processes

Vincent Dutordoir

eigenvalue, gegenbauer polynomial, polynomial, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Neural Information Processing SystemsAug-15-2025, 16:37:06 GMT

a9df2255ad642b923d95503b9a7958d8-Supplemental.pdf

All models discussed in the paper use ReLU non-linearity.

nullnull, polynomial, qq null, (16 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Joshi, Nirmit, Koubbi, Hugo, Misiakiewicz, Theodor, Srebro, Nathan

Learning single-index models via harmonic decomposition

arXiv.org Machine LearningJun-12-2025

We study the problem of learning single-index models, where the label $y \in \mathbb{R}$ depends on the input $\boldsymbol{x} \in \mathbb{R}^d$ only through an unknown one-dimensional projection $\langle \boldsymbol{w}_*,\boldsymbol{x}\rangle$. Prior work has shown that under Gaussian inputs, the statistical and computational complexity of recovering $\boldsymbol{w}_*$ is governed by the Hermite expansion of the link function. In this paper, we propose a new perspective: we argue that "spherical harmonics" -- rather than "Hermite polynomials" -- provide the natural basis for this problem, as they capture its intrinsic "rotational symmetry". Building on this insight, we characterize the complexity of learning single-index models under arbitrary spherically symmetric input distributions. We introduce two families of estimators -- based on tensor unfolding and online SGD -- that respectively achieve either optimal sample complexity or optimal runtime, and argue that estimators achieving both may not exist in general. When specialized to Gaussian inputs, our theory not only recovers and clarifies existing results but also reveals new phenomena that had previously been overlooked.

algorithm, artificial intelligence, machine learning, (16 more...)

2506.09887

Country:

Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Di Lillo, Simmaco, Marinucci, Domenico, Salvi, Michele, Vigogna, Stefano

Fractal and Regular Geometry of Deep Neural Networks

arXiv.org Machine LearningApr-8-2025

We study the geometric properties of random neural networks by investigating the boundary volumes of their excursion sets for different activation functions, as the depth increases. More specifically, we show that, for activations which are not very regular (e.g., the Heaviside step function), the boundary volumes exhibit fractal behavior, with their Hausdorff dimension monotonically increasing with the depth. On the other hand, for activations which are more regular (e.g., ReLU, logistic and $\tanh$), as the depth increases, the expected boundary volumes can either converge to zero, remain constant or diverge exponentially, depending on a single spectral parameter which can be easily computed. Our theoretical results are confirmed in some numerical experiments based on Monte Carlo simulations.

artificial intelligence, machine learning, neural network, (19 more...)

2504.0625

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(3 more...)

Genre: Research Report > New Finding (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

arXiv.org Artificial IntelligenceSep-5-2024

A Physics-Informed Machine Learning Approach for Solving Distributed Order Fractional Differential Equations

Aghaei, Alireza Afzal

This paper introduces a novel methodology for solving distributed-order fractional differential equations using a physics-informed machine learning framework. The core of this approach involves extending the support vector regression (SVR) algorithm to approximate the unknown solutions of the governing equations during the training phase. By embedding the distributed-order functional equation into the SVR framework, we incorporate physical laws directly into the learning process. To further enhance computational efficiency, Gegenbauer orthogonal polynomials are employed as the kernel function, capitalizing on their fractional differentiation properties to streamline the problem formulation. Finally, the resulting optimization problem of SVR is addressed either as a quadratic programming problem or as a positive definite system in its dual form. The effectiveness of the proposed approach is validated through a series of numerical experiments on Caputo-based distributed-order fractional differential equations, encompassing both ordinary and partial derivatives.

differential equation, equation, fractional differential equation, (12 more...)

arXiv.org Artificial Intelligence

2409.03507

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Castro-Correa, Jhon A., Giraldo, Jhony H., Badiey, Mohsen, Malliaros, Fragkiskos D.

Gegenbauer Graph Neural Networks for Time-varying Signal Reconstruction

arXiv.org Artificial IntelligenceApr-3-2024

Reconstructing time-varying graph signals (or graph time-series imputation) is a critical problem in machine learning and signal processing with broad applications, ranging from missing data imputation in sensor networks to time-series forecasting. Accurately capturing the spatio-temporal information inherent in these signals is crucial for effectively addressing these tasks. However, existing approaches relying on smoothness assumptions of temporal differences and simple convex optimization techniques have inherent limitations. To address these challenges, we propose a novel approach that incorporates a learning module to enhance the accuracy of the downstream task. To this end, we introduce the Gegenbauer-based graph convolutional (GegenConv) operator, which is a generalization of the conventional Chebyshev graph convolution by leveraging the theory of Gegenbauer polynomials. By deviating from traditional convex problems, we expand the complexity of the model and offer a more accurate solution for recovering time-varying graph signals. Building upon GegenConv, we design the Gegenbauer-based time Graph Neural Network (GegenGNN) architecture, which adopts an encoder-decoder structure. Likewise, our approach also utilizes a dedicated loss function that incorporates a mean squared error component alongside Sobolev smoothness regularization. This combination enables GegenGNN to capture both the fidelity to ground truth and the underlying smoothness properties of the signals, enhancing the reconstruction performance. We conduct extensive experiments on real datasets to evaluate the effectiveness of our proposed approach. The experimental results demonstrate that GegenGNN outperforms state-of-the-art methods, showcasing its superior capability in recovering time-varying graph signals.

dataset, gegengnn, graph signal, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TNNLS.2024.3381069

2403.198

Country:

North America > United States > Delaware > New Castle County > Newark (0.14)
Europe > France (0.04)
South America > Colombia > Norte de Santander Department > Cúcuta (0.04)
(7 more...)

Genre:

Research Report > New Finding (0.88)
Research Report > Promising Solution (0.54)

Industry:

Health & Medicine (0.93)
Government (0.67)
Education (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Lu, Yue M., Yau, Horng-Tzer

An Equivalence Principle for the Spectrum of Random Inner-Product Kernel Matrices with Polynomial Scalings

arXiv.org Machine LearningMay-5-2023

We investigate random matrices whose entries are obtained by applying a nonlinear kernel function to pairwise inner products between $n$ independent data vectors, drawn uniformly from the unit sphere in $\mathbb{R}^d$. This study is motivated by applications in machine learning and statistics, where these kernel random matrices and their spectral properties play significant roles. We establish the weak limit of the empirical spectral distribution of these matrices in a polynomial scaling regime, where $d, n \to \infty$ such that $n / d^\ell \to \kappa$, for some fixed $\ell \in \mathbb{N}$ and $\kappa \in (0, \infty)$. Our findings generalize an earlier result by Cheng and Singer, who examined the same model in the linear scaling regime (with $\ell = 1$). Our work reveals an equivalence principle: the spectrum of the random kernel matrix is asymptotically equivalent to that of a simpler matrix model, constructed as a linear combination of a (shifted) Wishart matrix and an independent matrix sampled from the Gaussian orthogonal ensemble. The aspect ratio of the Wishart matrix and the coefficients of the linear combination are determined by $\ell$ and the expansion of the kernel function in the orthogonal Hermite polynomial basis. Consequently, the limiting spectrum of the random kernel matrix can be characterized as the free additive convolution between a Marchenko-Pastur law and a semicircle law. We also extend our results to cases with data vectors sampled from isotropic Gaussian distributions instead of spherical distributions.

inequality, matrix, polynomial, (17 more...)

2205.06308

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > Nassau County > Mineola (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

arXiv.org Machine LearningApr-21-2022

Spectrum of inner-product kernel matrices in the polynomial regime and multiple descent phenomenon in kernel ridge regression

Misiakiewicz, Theodor

We study the spectrum of inner-product kernel matrices, i.e., $n \times n$ matrices with entries $h (\langle \textbf{x}_i ,\textbf{x}_j \rangle/d)$ where the $( \textbf{x}_i)_{i \leq n}$ are i.i.d.~random covariates in $\mathbb{R}^d$. In the linear high-dimensional regime $n \asymp d$, it was shown that these matrices are well approximated by their linearization, which simplifies into the sum of a rescaled Wishart matrix and identity matrix. In this paper, we generalize this decomposition to the polynomial high-dimensional regime $n \asymp d^\ell,\ell \in \mathbb{N}$, for data uniformly distributed on the sphere and hypercube. In this regime, the kernel matrix is well approximated by its degree-$\ell$ polynomial approximation and can be decomposed into a low-rank spike matrix, identity and a `Gegenbauer matrix' with entries $Q_\ell (\langle \textbf{x}_i , \textbf{x}_j \rangle)$, where $Q_\ell$ is the degree-$\ell$ Gegenbauer polynomial. We show that the spectrum of the Gegenbauer matrix converges in distribution to a Marchenko-Pastur law. This problem is motivated by the study of the prediction error of kernel ridge regression (KRR) in the polynomial regime $n \asymp d^\kappa, \kappa >0$. Previous work showed that for $\kappa \not\in \mathbb{N}$, KRR fits exactly a degree-$\lfloor \kappa \rfloor$ polynomial approximation to the target function. In this paper, we use our characterization of the kernel matrix to complete this picture and compute the precise asymptotics of the test error in the limit $n/d^\kappa \to \psi$ with $\kappa \in \mathbb{N}$. In this case, the test error can present a double descent behavior, depending on the effective regularization and signal-to-noise ratio at level $\kappa$. Because this double descent can occur each time $\kappa$ crosses an integer, this explains the multiple descent phenomenon in the KRR risk curve observed in several previous works.

artificial intelligence, machine learning, polynomial, (16 more...)

2204.10425

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)